Goto

Collaborating Authors

 loss ratio


Lyapunov Learning at the Onset of Chaos

Benati, Matteo, Londei, Alessandro, Lanzieri, Denise, Loreto, Vittorio

arXiv.org Artificial Intelligence

Handling regime shifts and non-stationary time series in deep learning systems presents a significant challenge. In the case of online learning, when new information is introduced, it can disrupt previously stored data and alter the model's overall paradigm, especially with non-stationary data sources. Therefore, it is crucial for neural systems to quickly adapt to new paradigms while preserving essential past knowledge relevant to the overall problem. In this paper, we propose a novel training algorithm for neural networks called \textit{Lyapunov Learning}. This approach leverages the properties of nonlinear chaotic dynamical systems to prepare the model for potential regime shifts. Drawing inspiration from Stuart Kauffman's Adjacent Possible theory, we leverage local unexplored regions of the solution space to enable flexible adaptation. The neural network is designed to operate at the edge of chaos, where the maximum Lyapunov exponent, indicative of a system's sensitivity to small perturbations, evolves around zero over time. Our approach demonstrates effective and significant improvements in experiments involving regime shifts in non-stationary systems. In particular, we train a neural network to deal with an abrupt change in Lorenz's chaotic system parameters. The neural network equipped with Lyapunov learning significantly outperforms the regular training, increasing the loss ratio by about $96\%$.


CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models

Gong, Zi, Yu, Hang, Liao, Cong, Liu, Bingchang, Chen, Chaoyu, Li, Jianguo

arXiv.org Artificial Intelligence

Multi-task learning (MTL) benefits the fine-tuning of large language models (LLMs) by providing a single model with improved performance and generalization ability across tasks, presenting a resource-efficient alternative to developing separate models for each task. Yet, existing MTL strategies for LLMs often fall short by either being computationally intensive or failing to ensure simultaneous task convergence. This paper presents CoBa, a new MTL approach designed to effectively manage task convergence balance with minimal computational overhead. Utilizing Relative Convergence Scores (RCS), Absolute Convergence Scores (ACS), and a Divergence Factor (DF), CoBa dynamically adjusts task weights during the training process, ensuring that the validation loss of all tasks progress towards convergence at an even pace while mitigating the issue of individual task divergence. The results of our experiments involving three disparate datasets underscore that this approach not only fosters equilibrium in task convergence but enhances the LLMs' performance by up to 13% relative to the second-best baselines. Code is open-sourced at https://github.com/codefuse-ai/MFTCoder.


BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python

Haines, Nathaniel, Goold, Conor

arXiv.org Machine Learning

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in so-called $\mathcal{M}$-open settings where the true model is not in the set of candidate models, and may be neither mathematically reifiable nor known precisely. This practice of model averaging has a rich history in statistics and machine learning, and there are currently a number of methods to estimate the weights for constructing model-averaged predictive distributions. Nonetheless, there are few existing software packages that can estimate model weights from the full variety of methods available, and none that blend model predictions into a coherent predictive distribution according to the estimated weights. In this paper, we introduce the BayesBlend Python package, which provides a user-friendly programming interface to estimate weights and blend multiple (Bayesian) models' predictive distributions. BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights. We demonstrate the usage of BayesBlend with examples of insurance loss modeling.


The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

Li, Conglong, Zhang, Minjia, He, Yuxiong

arXiv.org Artificial Intelligence

Recent works have demonstrated great success in pre-training large-scale autoregressive language models on massive GPUs. To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate. However, such practice is often brittle and leads to a so-called stability-efficiency dilemma: increasing the batch sizes and learning rates leads to better training efficiency but can also result in training instability, leading to poor generalization accuracy or failed runs. To better understand this phenomenon, we conduct an in-depth analysis on large-scale pre-training experiments replicating the GPT-2 model. We find that there is a strong correlation between training instability and extreme values of gradient variance, and that samples with long sequence lengths contribute to these extreme gradient variance values, especially at the beginning of the training, indicating that long sequence length can be a main source of training instability. Based on the analysis, we present a Sequence Length Warmup method that aims to solve the training stability-efficiency dilemma. Experiments replicating GPT-2 models show that our approach enables stable training with 8x larger batch size and 4x larger learning rate, whereas the baseline approach struggles with training instability. To achieve the same or better zero-shot evaluation results, our method reduces the required number of training tokens and wall clock time by up to 2.2x and 3.7x, respectively. Experiments replicating GPT-3 model (125M) show that our approach enables stable training with 8x larger batch size and 40x larger learning rate, and retains 99% of the zero-shot accuracy on 11 tasks using 10x less data and 17x less time compared to the original GPT-3 training recipe, while the baseline diverges under the same settings and only retain 95% of accuracy under lower learning rate.


Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Frei, Spencer, Vardi, Gal, Bartlett, Peter L., Srebro, Nathan, Hu, Wei

arXiv.org Artificial Intelligence

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.


1 Growth Stock Down 85% to Buy Right Now

#artificialintelligence

It's no secret the technology sector of the stock market has been crushed this year. The Nasdaq 100 index, a widely followed benchmark for high-growth tech companies, has declined by 29% in 2022 so far. But many individual stocks have been hit even harder, particularly those focused on serving consumers because it makes them more vulnerable to the broader economic slowdown. Interest rates have been rising because inflation recently topped a 40-year high, and that's placing a stranglehold on people's spending power. Still, some consumer-centric companies have managed to maintain rapid growth rates in this difficult period.


2 Artificial-Intelligence Growth Stocks Shaping the Future of Technology

#artificialintelligence

Innovative technologies have regularly reshaped the world. In the last few decades, inventions like the personal computer, the internet, and the smartphone have dramatically enhanced human productivity, while creating tremendous wealth in the process. And artificial intelligence (AI) promises to be the next transformative technology. In fact, research company McKinsey estimates that AI could boost global economic output by 16% (or $13 trillion) between 2018 and 2030. Companies like Nvidia (NVDA 1.74%) and Lemonade (LMND -6.03%) could be major beneficiaries of that trend because both are using AI to shape the future of technology.


GuideOne Selects Betterview

#artificialintelligence

Betterview, an InsurTech provider of actionable property intelligence to property and casualty (P&C) insurance companies, is pleased to announce that GuideOne Insurance Company (GuideOne) has selected to implement the Betterview Property Intelligence & Risk Management Platform. GuideOne, a leading provider of coverage for religious organizations, educational institutions, and nonprofit and human services organizations across all 50 states for over 75 years, needed a solution to increase underwriting efficiency and strengthen risk management processes for commercial properties. Structurally, religious organizations often have complex roofs and pose a larger threat for insurers because the buildings are vacant most of the week. This puts such organizations and facilities at a greater risk for large losses when not managed, maintained, or monitored sufficiently. "GuideOne works differently than other insurers," said Betterview co-founder and chief operations officer David Tobias.


Gradient AI Joins Guidewire Insurtech Vanguards Program

#artificialintelligence

BOSTON--(BUSINESS WIRE)--Gradient AI, a leading enterprise software provider of artificial intelligence (AI) solutions for the insurance industry, announced that the company has joined Guidewire's Insurtech Vanguards program, a new initiative led by property and casualty (P&C) cloud platform provider Guidewire (NYSE: GWRE), to help insurers learn about the newest insurtechs and how to best leverage them. "Guidewire is one of the most recognized platform providers in the insurance industry today and we are proud to be working with the company," said Stan Smith, founder and CEO, Gradient AI. "As a part of the Guidewire Insurtech Vanguards program we look forward to helping insurers improve underwriting and claim processes with our AI-power insurance solutions." Insurtech Vanguards is a community of select startups and technology providers that are bringing novel solutions to the P&C industry. As part of the program, Guidewire provides strategic guidance to and advocates for the participating insurtechs, while connecting them with Guidewire's P&C customers. "Gradient AI is an effective, innovative, and proven insurance solution providing insurers the intelligence needed to significantly improve their efficiency and profitability in claims and underwriting operations," said Laura Drabik, chief evangelist, Guidewire.


How is AI improving profitability?

#artificialintelligence

Read more: Why understanding AI is "non-negotiable" for insurers "The first step in the process is to identify opportunities of similarly situated or homogeneous risks. Once we know what product we're underwriting and what the target risks look like, we take those characteristics, have our digital partners scan the web for publicly available information, and use their AI engine to identify any targets by geographic region that meet those risk characteristics," he explained. This information is then passed on to Fortegra's underwriting agency to assess opportunities for growth, improving the overall efficiency of the sales process. Now data can play a critical role and be applied to an AI engine to ensure the hierarchy of risk characteristics are properly set. Kahlbaugh highlighted that this strategy allows underwriters and agents to be far more productive, and the application of technology to improve both relationships ultimately translates to better risks, lower loss ratios and higher commission profits.